A generalisation of the method of regression calibration and comparison with Bayesian and frequentist model averaging methods

For many cancer sites low-dose risks are not known and must be extrapolated from those observed in groups exposed at much higher levels of dose. Measurement error can substantially alter the dose-response shape and hence the extrapolated risk. Even in studies with direct measurement of low-dose exposures measurement error could be substantial in relation to the size of the dose estimates and thereby distort population risk estimates. Recently, there has been considerable attention paid to methods of dealing with shared errors, which are common in many datasets, and particularly important in occupational and environmental settings. In this paper we test Bayesian model averaging (BMA) and frequentist model averaging (FMA) methods, the first of these similar to the so-called Bayesian two-dimensional Monte Carlo (2DMC) method, and both fairly recently proposed, against a very newly proposed modification of the regression calibration method, the extended regression calibration (ERC) method, which is particularly suited to studies in which there is a substantial amount of shared error, and in which there may also be curvature in the true dose response. The quasi-2DMC with BMA method performs well when a linear model is assumed, but very poorly when a linear-quadratic model is assumed, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5% when the magnitude of shared Berkson error is large (50%). For the linear model the bias is generally under 10%. However, using a linear-quadratic model it produces substantially biased (by a factor of 10) estimates of both the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated. FMA performs as well as quasi-2DMC with BMA when a linear model is assumed, and generally much better with a linear-quadratic model, although the coverage probability for the quadratic coefficient is uniformly too high. However both linear and quadratic coefficients have pronounced upward bias, particularly when Berkson error is large. By comparison ERC yields coverage probabilities that are too low when shared and unshared Berkson errors are both large (50%), although otherwise it performs well, and coverage is generally better than the quasi-2DMC with BMA or FMA methods, particularly for the linear-quadratic model. The bias of the predicted relative risk at a variety of doses is generally smallest for ERC, and largest for the quasi-2DMC with BMA and FMA methods (apart from unadjusted regression), with standard regression calibration and Monte Carlo maximum likelihood exhibiting bias in predicted relative risk generally somewhat intermediate between ERC and the other two methods. In general ERC performs best in the scenarios presented, and should be the method of choice in situations where there may be substantial shared error, or suspected curvature in the dose response.


Introduction
Moderate and high doses of ionising radiation are well established causes of most types of cancer 1,2 .There is emerging evidence, particularly for leukaemia and thyroid cancer, of risk at low dose (<0.1 Gy) radiation [3][4][5][6] (roughly 50 times the dose from background radiation in a year).For most other cancer endpoints it is necessary to assess risks via extrapolation from groups exposed at moderate and high levels of dose [7][8][9][10][11][12][13] .Such extrapolations, which are dependent on knowing the true dose-response relationship, as inferred from some reference moderate/high-dose data (very often the Japanese atomic bomb survivors), are subject to some uncertainty, not least that induced by systematic and random dosimetric errors that may be present in that moderate/highdose data 1,14 .Extensive biostatistical research over the last 30 years have done much to develop understanding of this issue [15][16][17][18][19][20][21][22][23][24][25][26][27][28][29][30] and in particular the role played by various types of dose measurement error 31 .Among the simplest methods of correction for dose error, regression calibration, which entails substitution of the conditional expectation of the true dose given the observed dose, is straightforward to apply, and is often used to correct for classical error 31 .
However, it only takes account of the 1 st order dose error terms in the Taylor expansion of the likelihood, and does not take account of correlations between dose errors 32 .It is also prone to bias when dose errors are large, or errors are differential, or the dose response has substantial curvature 33 .Other methods of correction for dose error, in particular Markov Chain maximum likelihood (MCML) 25,26,30,34 , and fully Bayesian methods [21][22][23]29 , both of which take full account of the uncertainty in doses (in particular 2 nd and higher order dose error terms in the Taylor expansion of the likelihood), can work better in these circumstances.
A variant of the commonly used method of regression calibration 31 has been very recently proposed which is particularly suited to studies with substantial shared error with dose response non-linearity 32 .This so-called extended regression calibration (ERC) method can be used in settings where there is a mixture of Berkson and classical error 32 .In fits to synthetic datasets in which there is substantial upward curvature in the true dose response, and varying (and sometimes substantial) amounts of classical and Berkson error, the ERC method generally outperformed both standard regression calibration, MCML and unadjusted regression, particularly with respect to the coverage probabilities of the quadratic coefficient, and for larger magnitudes of the Berkson error, whether this is shared or unshared 32 .
A Bayesian model averaging (BMA) method has been also recently proposed, the so-called 2-dimensional Monte Carlo with Bayesian model averaging (2DMC with BMA) method 28 , which has been used in fits to radiation thyroid nodule data 35 .The so-called frequentist model averaging (FMA) model has also been recently proposed, although only fitted to simulated data 36 .In the present paper we shall assess the performance of a variant implementation of the 2DMC with BMA method, which is more closely aligned with standard implementations of BMA 37 , and FMA against ERC, making also comparisons with other methods of correction for dose error using simulated data.The simulated data used is exactly as in the previous report 32 .

Synthetic data used for assessing corrections for dose error
The methods and data exactly parallel those of the previous paper, 32 using publicly available Life Span Study (LSS) leukaemia data 38 to guide construction of a number of artificial datasets.
Specifically we used the person year distribution by bone marrow dose groups 0-0.07, 0.08-0.19,0.20-0.99,1.00-2.49,≥2.50 Gy.The central estimates of dose we assumed are close to the person year weighted means of these groups, and as given in Supplement A The variables , , , , , Poisson linear relative risk generalised linear models 39 were fitted to this grouped data, with rates given by expression (3), using as offsets the previously-specified number per group 32 .Models were fitted using six separate methods (unadjusted, regression calibration, ERC, MCML, 2DMC and BMA, FMA).For ERC and the other methods previously used the methods of deriving doses and model fitting were as in our earlier paper 32 .It should be noted that for all except the unadjusted regressions the true (rather than surrogate) dose is used, so that classical error does not figure.Only for unadjusted regression is it the surrogate dose that is used.
We used a BMA method somewhat analogous to the 2DMC with BMA method of Kwon et al 28 , using the full set of mean true doses per group as previously generated for MCML, the mean doses per group for each simulation being given by group means of the samples generated by expression (3), averaged over the 1000 m  dose samples.The model was fitted using Bayesian Markov Chain Monte Carlo (MCMC) methods.Associated with the dose vector is a vector of probabilities , 1,...,1000 j p j  which is generated using variables , 1,...,999 j j

 
, so that: This is therefore quite close to the method proposed by Hoeting et al 37 , and somewhat distinct from the formulation of 2DMC with BMA proposed by Kwon et al 28 , as we discuss at greater length below.For this reason we shall describe our own method as quasi-2DMC with BMA.The standard formulation of BMA, as given by Hoeting et al 37 , and which we employ, is based on the posterior probability: the precise form of SAMC that was used by them 28 , and for that reason we have adopted this alternative.
All main model parameters ( , , , k     ) had normal priors, with mean 0 and standard deviation (SD) 1000.The Metropolis-Hastings algorithm was used to generate samples from the posterior distribution.Random normal proposal distributions were assumed for all variables, with SD of 0.2 for  and 1 for ,   , and SD 2 for all k  .The k  were proposed in blocks of 10.Two separate chains were used, in order to compute the Brooks-Gelman-Rubin (BGR) convergence statistic 41,42 .The first 1000 simulations were discarded, and a further 1000 simulations taken for sampling.The proposal SDs and number of burn-in sample were chosen to give mean BGR statistics (over the 500 simulated datasets) that were in all cases less than 1.03 and acceptance probabilities of about 30% for the main model parameters ( , ,    ), suggesting good mixture and likelihood of chain convergence.For the ERC model confidence intervals were (as previously) derived using the profile likelihood 39 and for the quasi-2DMC with BMA model Bayesian uncertainty intervals were derived.
The FMA model of Kwon et al 36 , and each such sample given an AIC-derived weight via: [It should be noted that Eq. ( 6) differs from the formula mistakenly given by Kwon et al 36 This was evaluated for two values of predicted dose, 0.1 Gy

Results
As shown in Table 1, using the linear-quadratic model the coverage probabilities of the ERC method for the linear coefficient  are near the desired 95% level, irrespective of the magnitudes of assumed Berkson error, whether shared or unshared.However, the ERC method yields coverage probabilities that are somewhat too low when shared and unshared Berkson errors are both large (with logarithmic SD=50%), although otherwise it performs well (Table 1).
It should be noted that classical error will have no effect on any of these models, as its only effect is on the unadjusted regression model (via sampling of the surrogate dose), and so the effect is not shown.By contrast the coverage probabilities of both the linear coefficient  and the quadratic coefficient  for the quasi-2DMC with BMA method are generally much too low, and when shared Berkson error is large (50%) the coverage probabilities do not exceed 5% (Table 1).The coverage for the FMA method is generally better, and for the coefficient  does not depart too markedly from the desired 95%; however, the coverage of the coefficient  tends to be too high, for any non-zero level of Berkson error, whether shared or unshared (Table 1).
Table 2 shows that for the linear model the coverage percentage is generally too high for ERC, MCML and FMA, and slightly too low for quasi-2DMC with BMA, but more or less correct for regression calibration.
Table 3 shows the coefficient mean values, averaged over all 500 simulations, assuming a linear-quadratic model.A notable feature is that for larger values of Berkson error, the linear coefficient  for quasi-2DMC with BMA is substantially overestimated, and the quadratic dose coefficient  substantially underestimated, both by factors of about 10.For ERC the estimates of the quadratic coefficient  are upwardly biased, but not by such large amounts.For FMA both coefficients have pronounced upward bias, particularly for large shared Berkson error (50%) (Table 3).
Table 4 shows that the bias in ERR for ERC evaluated either at 0.1 Gy or 1 Gy does not exceed 30% in absolute value.Regression calibration performs somewhat worse, with bias ~60% when shared and unshared Berkson error are large (50%) for predictions at 1 Gy, although otherwise with bias under 30%.For all but the smallest shared and unshared Berkson errors (both 0% or 20%) MCML has bias ~35-70%, with bias particularly severe at 1 Gy.Quasi-2DMC with BMA performs somewhat worse, with bias in excess of ~30% and sometimes in excess of 100% when Berkson errors are large, and bias most pronounced at low dose.Unadjusted regression yields almost as severe bias, which exceeds 50% in many cases for predictions at 1 Gy, and when shared or unshared classical errors are large (50%) often exceeds 100% (Table 4).Bias for FMA is generally moderate to severe, and particularly bad (>100%) when shared Berkson error is large (50%) (Table 4).
Table 5 shows the coefficient mean values, averaged over all 500 simulations, for the linear model, and percentage bias.For most methods bias is modest, generally under ~10%.The only significant exception is FMA where the bias approaches 30% when shared Berkson errors are large (50%) (Table 5).
It should be noted that the behaviour of all regression methods in both scenarios (linear, linear-quadratic) is very similar when Berkson errors are 50%, irrespective of whether unshared Berkson errors are 0, 20% or 50% (Tables 1-5).

Discussion
We have demonstrated that the quasi-2DMC with BMA method performs well when a linear model is assumed and fitted, albeit with coverage for the linear coefficient that is slightly too low (~90%) (Table 2).However, it performs very poorly when a linear-quadratic model is assumed and fitted, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5% when the magnitude of shared Berkson errors is large (50%) (Table 1).For the linear model the bias is generally modest, under 10% (Table 5).However, if a linear-quadratic model is assumed there is substantial bias (by a factor of 10) in estimates of both the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated (Table 3).FMA performs as well as the other method when a linear model is assumed (Table 2), and generally much better assuming a linear-quadratic model, although the coverage probability for the quadratic coefficient is uniformly too high (Table 1).However both linear and quadratic coefficients have pronounced upward bias, particularly when Berkson error is large (50%) (Table 3).By comparison the ERC method yields coverage probabilities that are too high when a linear model is fitted (Table 2), and too low when a linear-quadratic model is fitted with shared and unshared Berkson errors both large (50%) (Table 1), although otherwise it performs well, and coverage is generally better than for quasi-2DMC with BMA or FMA, particularly for the linear-quadratic model.As shown previously it generally outperforms all other methods (regression calibration, MCML, unadjusted regression) when a linear-quadratic model is assumed 32 .The upward bias in estimates of the  coefficient and the downward bias in the estimates of the  coefficient, at least for larger magnitudes of error (Table 3) largely explains the poor coverage of quasi-2DMC with BMA in these cases (Table 1).The bias of the predicted ERR using the linear-quadratic model at a variety of doses is generally smallest for ERC, and largest (apart from unadjusted regression) for quasi-2DMC with BMA and for FMA, with standard regression calibration and MCML exhibiting bias in predicted ERR generally somewhat intermediate between the other two methods (Table 4).The fact that bias in ERR tends to be larger for a dose of 1 Gy (Table 4, Table 5) relates to the fact that at higher assumed dose the contribution of the quadratic coefficient is relatively more important.However, this is not always the case, and for example for the quasi-2DMC with BMA method the inflated linear coefficient and much reduced quadratic coefficient (Table 3) generally result in bias being more severe at the lower dose (Table 4).
As noted above, the form of the quasi-2DMC with BMA model that we fit differs slightly from that employed by Kwon et al 28 .Our method and that of Kwon et al 28 should be approximately equivalent, although the latter is considerably more computationally challenging, and may be the reason why Kwon et al 28 resorted to use of the SAMC method 40 , in order to get their method to work.As noted in the Methods, unfortunately Kwon et al do not provide enough information to infer the precise form of SAMC that was used by them 28 , and it was for that reason we adopted this alternative, which is in any case possibly more computationally efficient.
It is possible that the SAMC implementation used by Kwon et al 28 may behave differently from the more standard implementation of BMA given here.Kwon et al 28 report results of a simulation study that tested the 2DMC with BMA method against what they term "conventional regression", which may have been regression calibration.They did not assess performance against MCML, and in all cases only a linear model was tested 28 unlike the simulations given here and in a previous publication 32 , which used a linear-quadratic model, and in the present paper also a linear model.Kwon et al 28 report generally better performance of 2DMC with BMA against the regression calibration alternative.Their findings using the linear model are consistent with ours (Table 2, Table 5).Kwon et al 36 tested FMA against the 2DMC method and against the so-called corrected information matrix (CIM) method 44 and observed similar performance, in particular adequate coverage, of all three methods, although narrower CI were produced by FMA compared with CIM under a number of scenarios.However, in all cases only a linear model was tested 36 .Set against this, Stram et al reported results of a simulation study 45 which suggested that the 2DMC with BMA method will produce substantially upwardly biased estimates of risk, also that the coverage may be poor, somewhat confirming our own findings, at least using the linearquadratic model; although we do not always find upward bias, and generally little bias when a linear model is assumed, the coverage of both regression coefficients for quasi-2DMC with BMA is poor for larger values of shared Berkson error when a linear-quadratic model is assumed (Table 1, Table 3, Table 4).One possible reason for the bias that may occur in quasi-2DMC with BMA is that by chance a dose vector is chosen which results in good fit by the model (linear or linear-quadratic) but which nevertheless is substantially biased, resulting in substantial bias in the linear and/or quadratic coefficients.While in most circumstances (as in the present simulations) there is no information regarding the dose vector, it might occur that one has an informative prior for the j p , which would be expected to to reduce the likelihood of such bias.
Dose error in radiation studies is unavoidable, even in experimental settings.It is particularly common in epidemiological studies, in particular those of occupationally exposed groups, where shared errors, resulting from group assignments of dose, dosimetry standardizations or possible variability resulting from application of, for example, biokinetic or environmental or biokinetic models result in certain shared unknown (and variable) parameters between individuals or groups.There have been extensive assessments of uncertainties in dose in these settings [46][47][48] .Methods for taking account of such uncertainties cannot always correct for them, but they at least enable error adjustment (e.g. to CI) to be made 31 .As previously discussed 32 the defects in the standard type of regression calibration are well known, in particular that the method can break down when dose error is substantial 31 , as it is in many of our scenarios.It also fails to take account of shared errors.Perhaps because of this a number of methods have been recently developed that take shared error into account, in particular the 2DMC and BMA method 28 and the CIM method 44 .The CIM method only applies to situations where there is pure Berkson error.Both 2DMC with BMA and CIM have been applied in number of settings, the former to analysis of thyroid nodules in nuclear weapons test exposed individuals 35 , and the latter to assessment of lung cancer risk in Russian Mayak nuclear workers 49 and cataract risk in the US Radiologic Technologists 50 .In principle the simulation extrapolation (SIMEX) method 51 can be applied in situations where there is shared (possibly combined with unshared) classical error, where the magnitudes of shared and unshared error are known.However, this was not part of the original formulation of SIMEX 51 .Possibly because of the restrictions on error structure and its extreme computational demands SIMEX has only rarely been used in radiation settings 27,52 .

Conclusions
Using methods and data that exactly parallel those of the previous paper, 32 differning only in that linear as well as linear-quadratic models were assumed, we have demonstrated that the quasi-2DMC with BMA method performs well when a linear model is assumed (Table 2), but very poorly when a linear-quadratic dose response is assumed, with coverage probabilities both for the linear and quadratic dose coefficients that are under 5% when the magnitude of shared Berkson error is moderate to large (Table 1).The bias with a linear model for this method is generally modest (under 10%) (Table 4), but for the linear-quadratic model bias is substantial (by a factor of 10) both for the linear and quadratic coefficients, with the linear coefficient overestimated and the quadratic coefficient underestimated (Table 3).FMA performs generally better, although when assuming a linear-quadratic model the coverage probability for the quadratic coefficient is uniformly too high (Table 1), as it is also for the linear coefficient assuming a linear model (Table 2).However both linear and quadratic coefficients using FMA have pronounced upward bias, particularly when Berkson error is large (50%) (Table 3, Table 4,  4, Table 5).In general ERC performs best in the scenarios presented, and should be the method of choice in situations where there may be substantial shared error, or suspected curvature in the dose response.
given by Eq. (4).We fitted via successive application of Metropolis-Hastings samplers to (a) sample conditionally the model dose response parameters , ,  conditionally on , ,    .Kwon et al 28 did this slightly differently, sampling (a) the dose response parameters , ,    conditional on k , (b) sampling of k (via a multinomial distribution) conditional on , , Dirichlet distribution) conditional on , ,    and k .Kwon et al 28 resorted to use of an approximate Monte Carlo sampler, the so-called stochastic approximation Monte Carlo (SAMC) method of Liang et al40 .Unfortunately Kwon et al do not provide enough information to infer The datasets generated and analysed in the current study are available by running the Fortran 95/2003 program fitter_shared_error_simulation_reg_cal_Bayes_FMA.for, given in the online web repository, with any of the 12 steering input files given there.All are described in Supplement B. The datasets are temporarily stored in computer memory, and the program uses them for fitting the Poisson models described in the Methods section.

Table A1 ,
although for the uppermost dose group we assigned a central estimate of 2 Gy.A composite Berkson-classical error model was used in which the true dose

Table 5 )
32By comparison the recently developed ERC method32yields coverage probabilities that are too high when a linear model is assumed, and too low when a linear-quadratic model is assumed and shared and unshared Berkson errors are both large (50%), although otherwise it performs well, and coverage is generally better than for quasi-2DMC with BMA or FMA, particularly for the linear-quadratic model.The bias of the predicted ERR at a variety of doses is generally smallest for ERC, and largest for quasi-2DMC with BMA and FMA, with standard regression calibration and MCML exhibiting bias in predicted ERR generally somewhat intermediate between the other two methods (Table

Table 1 . Coverage probability of profile likelihood (for extended regression calibration) or Bayesian posterior confidence intervals (for quasi-2DMC with BMA) or frequentist model averaging (FMA) for fits of linear-quadratic model.
32verageprobability evaluated using n=500 dose+cancer ensembles, all with classical error (shared and unshared) of 20%.The central three columns are mostly taken from the paper of Little et al32.

Table 2 . Coverage probability of profile likelihood (for extended regression calibration) or Bayesian posterior confidence intervals (for quasi-2DMC with BMA) or frequentist model averaging (FMA) for fits of linear model.
Coverage probability evaluated using n=500 dose+cancer ensembles, all with classical error (shared and unshared) of 20%.

Table 3 . Mean over n=500 dose+cancer ensembles of regression coefficients in fits of linear-quadratic model, all with classical error (shared and unshared) of 20%.
32e central two columns are mostly taken from the paper of Little et al32.

Table 4 . Percentage bias in excess relative risk (ERR) using linear-quadratic model predicted at 0.1 Gy or 1 Gy with various sorts of dose error correction, using mean regression coefficients over n=500 dose+cancer ensembles (as inTable 2 )
Magnitude of error distributionPercentage bias in excess relative risk estimates by type of dose error correction